Using r/LivestreamFail to understand twitch streamers,chat and emotes.
Recently, twitch literature has begun characterizing twitch communities through twitch chat, viewership trends, and content but no known projects have used resources that exist outside of twitch to understand how twitch communities manifest and interact with one another.
One subreddit called LivestreamFail (LSF) is a dedicated subreddit where users share these twitch clips, general twitch news, and twitch drama.LivestreamFail is one way smaller streamers become noticed and is a platform that I can use to compare big and small communities. I’m interested in the ways emotes are used between smaller and larger communities because I believe emote meanings and sentiments are being actively redefined. This analysis will be split into an LSF part and Twitch emote-sentiment part.
This analysis will investigate users, posts, and comments (sentiment and topic) on r/LivestreamFail and then investigate the comment data and emote use from twitch clips that were featured in LSF posts.
unfold the code block to see which libraries are used
library(pacman)
p_load(tidyverse,
tidytext,
tm,
lubridate,
stringr,
text2vec,
jsonlite,
widyr,
quanteda,
visNetwork,
igraph,
ggraph,
DT,
ggthemes,
readtext)
LSF STUFF SHOULD GO HERE.
data <- read_csv("C:/Users/macia/Documents/MSIA-19/Git/Reddit-and-Twitch/Data Collection/chats.csv",col_types = cols(X1 = col_skip()))
head(data, n = 10)
## # A tibble: 10 x 4
## body user date streamer
## <chr> <chr> <dttm> <chr>
## 1 +2 blakeleigh321 2017-01-04 23:24:33 Jerma985
## 2 OMEGALUL Humorous_Chimp 2013-03-29 16:35:36 Jerma985
## 3 OUT OF MOJO mayodongs 2019-10-02 15:48:12 Jerma985
## 4 DOOR OMEGALUL mprQQ 2016-01-05 14:07:25 Jerma985
## 5 OMEGALUL Clap zRaayan 2014-04-05 09:57:48 Jerma985
## 6 SPIDERMAN gum100 2011-03-11 18:28:18 Jerma985
## 7 +2 craigwhoisme 2019-12-17 21:08:26 Jerma985
## 8 OMEGALUL feralbutch 2019-01-30 19:27:07 Jerma985
## 9 LUL Angle_Dez 2014-07-23 23:43:32 Jerma985
## 10 OMEGALUL Pretendeer 2012-11-24 17:01:36 Jerma985
glimpse(data)
## Rows: 46,319
## Columns: 4
## $ body <chr> "+2", "OMEGALUL", "OUT OF MOJO", "DOOR OMEGALUL", "OMEGALU...
## $ user <chr> "blakeleigh321", "Humorous_Chimp", "mayodongs", "mprQQ", "...
## $ date <dttm> 2017-01-04 23:24:33, 2013-03-29 16:35:36, 2019-10-02 15:4...
## $ streamer <chr> "Jerma985", "Jerma985", "Jerma985", "Jerma985", "Jerma985"...
tl;dr - twich dmca issues, streamer bans, and the app I used prevented me from downloading alot more data. Next time, I’ll work with the twitch api directly.
Reddit data was gathered using python and PRAW (Python Reddit API Wrapper) to gather recent data from r/LivestreamFail (October 2020). This resulted in over 900 reddit posts. This data was then used in R scrape links and document if the clips had chat available for download.
To actually download the twitch chat, I used application by lay295 and zigagrcar on github found here.
The Digital Millennium Copyright Act is affecting twitch in a big way.
Twitch is currenly in hot water with DMCA claims, and they are banning streamers for repeated streaming “copyrighted” songs. One method streamers use to combat this is by deleting their content shortly after it was broadcasted. This affected data collection since the collected twitch clips were being actively taken down.
This led to the collection of twitch chat from 227 links present in from the reddit posts.
R and Rselenium was used to scrape the emote data from FrankerFaceZ and BettertwitchTV. Roughly the Top 300 emotes used from each site was collected (emote name and link to image).
This is what a “busy” chat may look like.
This is a data table with the names and images of the top 600 emotes from both sites (BTTV & FFZ) combined.
bttv emotes need to be updated, also not sure if gif emotes work.
# https://i.stack.imgur.com/kLMaS.jpg
test<-emote_data %>% mutate("emote_image" = paste("<img src=", emote_link, sep = "")) %>%
mutate(emote_image = paste0(emote_image,' height="52"></img>',sep = "")) %>% select(emote_name,emote_image)
datatable(test, escape = FALSE)
This chart shows us how many unique chat lines there are per streamer. This metric is useful for understanding which streamers may be getting the most attention during a point in time on LSF (October in this case). This metric should later be controlled for clip length, since longer clips offer more opprotunity for chat engagement.
data %>% group_by(streamer) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x = reorder(streamer,-n), y = n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(size = 12, angle = 15,vjust = .55))+
labs(title = "Which streamer has the most chats?")
This visualization show us the most active twitch chatters in our dataset. In a larger dataset, finding those high-interactcion chatters maybe useful for drawing links between communities or even creating a contributer badges on twith (like the founders badge).
# This creates a !%in% kind of deal
`%notin%` <- Negate(`%in%`)
data %>% group_by(user) %>% filter(user %notin% c("StreamElements","Streamlabs","Nightbot")) %>% count(sort = T) %>%
head(n=10)%>%
ggplot(aes(x = reorder(user,-n), y = n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(size = 8, angle = 15,vjust = .55),
plot.title = element_text(size = 20))+
labs(title = "Which user has the most chats?")
# Streamelements and streamlabs are bots.
This plot will give us further insight in the the demographics of the communities of top 5 streamers. This shows the number accounts created by year for each member of the chat by streamer. As an example, one conclusion that may be drawn is that streamers forsen and Mizkif are not attracting new accounts (New user/ban evaders) to their channels. Another conclusion that may be drawn is that Trainwreckstv in 2018, attracted alot of new users, and perhaps played a significant role in bringing new users to twitch. I should investigate futher to understand what happened with train in 2018. This was perhaps his drama year with MitchJones (A popular WOW streamer) or The Speech.
top_5_streamers <- data %>% group_by(streamer) %>% count(sort = T) %>% head(n=5) %>% distinct(streamer)
data %>% filter(streamer %in% top_5_streamers$streamer) %>% mutate(date_year = year(as.Date.character(date))) %>% group_by(date_year,streamer) %>% count(sort = T)%>%
ggplot(aes(x = date_year, y = n, color = streamer)) +
geom_line(size = 2)+
theme_wsj(base_size = 12, color = "green")+
labs(title = "Streamer Communities: Account Creation Dates", subtitle = "Top 5 Streamers")+
theme(plot.title = element_text(size = 15),plot.subtitle = element_text(size= 8),legend.title = element_blank(),legend.position = "bottom")
data %>%
add_count(user,streamer) %>%
filter(n>1,user %notin% c("StreamElements","Streamlabs","Nightbot","Fossabot")) %>%
group_by(user) %>%
add_count(n_distinct(streamer)) %>%
ungroup() %>%
top_n(`n_distinct(streamer)`, n = 2) %>%
arrange(desc(n)) %>%
filter(n >=20) %>%
add_count(user,streamer,name = "#_chat_per_clip") %>%
ggplot(aes(x = user, y = `#_chat_per_clip`, fill = streamer))+
geom_bar(position = "stack",stat = "identity")
Tokens, bigrams and trigrams can give us insign into popular emotes/words combinations and spams that occur in these chats.
tokens %>% group_by(word) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(word,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25))+
labs(title ="Token Counts")
data %>%
unnest_tokens(bigram,body,token = 'ngrams',n = 2)%>%
filter(str_detect(bigram,"^[:alpha:]")) %>%
group_by(bigram) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(bigram,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25,size = 9))+
labs(title ="Bigram Counts")
data %>%
unnest_tokens(trigram,body,token = 'ngrams',n = 3)%>%
filter(str_detect(trigram,"^[:alpha:]")) %>%
group_by(trigram) %>% count(sort = T)%>%
head(n=10) %>%
ggplot(aes(x= reorder(trigram,-n),y=n))+
geom_col()+
theme_wsj(base_size = 12, color = "green")+
theme(axis.text.x = element_text(angle = 25,size = 9))+
labs(title ="trigram Counts")
As you can see, there are alot of duplicated words/phrases. I like to think of these as spams.
Some popular ones are “I was here Pogu I was here Pogu…” or “OMEGALUL OMEGALUL OMEGALUL OMEGALUL..”. One way to combat this is by stripping the chat text to only it’s unique words.
corpus_data <- readtext("C:/Users/macia/Documents/MSIA-19/Git/Reddit-and-Twitch/Data Collection/corpus_data.csv", text_field = 'text')
glimpse(corpus_data)
## Rows: 35,303
## Columns: 3
## $ doc_id <chr> "corpus_data.csv.1", "corpus_data.csv.2", "corpus_data.csv...
## $ text <chr> "+2", "OMEGALUL", "OUT OF MOJO", "DOOR OMEGALUL", "OMEGALU...
## $ streamer <chr> "Jerma985", "Jerma985", "Jerma985", "Jerma985", "Jerma985"...
corpus <- corpus(corpus_data)
dfm <- dfm(corpus, remove_punct=T)
# Select emotes
emotes = emote_data$emote_name
tags = dfm_select(dfm,pattern = emotes)
#tags
toptag = names(topfeatures(tags,30))
# These are the top emotes mentioned in the dataset from the list of popular bttv/ffz emotes
head(toptag)
## [1] "omegalul" "pogu" "lulw" "kekw" "pepelaugh" "clap"
tag_fcm <- fcm(tags)
toptags_fcm <- fcm_select(tag_fcm, pattern = toptag)
textplot_network(toptags_fcm,min_freeq = 0.1, edge_alpha = 0.7, edge_size = 5)
tags = dfm_select(dfm, pattern = c("£","â","<","ó","ðÿ"),selection = "remove")
#tags
toptag = names(topfeatures(tags,20))
tag_fcm <- fcm(tags)
toptags_fcm <- fcm_select(tag_fcm, pattern = toptag)
textplot_network(toptags_fcm,min_freeq = 0.1, edge_alpha = 0.7, edge_size = 5)
###{-}
The Previous plots don’t show us the direction of the combinations. This is useful to understand the order of emotes/spams. Otherwise, one may think that Clap EZ is an acceptable spam.
count_bigrams <- function(data) {
data %>%
unnest_tokens(bigram,"body", token = "ngrams", n = 2) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
count(word1, word2, sort = TRUE)
}
visualize_bigrams <- function(bigrams) {
set.seed(2020)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
bigrams %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE, arrow = a) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_void()
}
viz.bigrams <- data %>%
count_bigrams()
# filter out rare combinations, as well as digits and produce graph
viz.bigrams %>%
filter(n >70) %>%
visualize_bigrams()
With this vizualization, I aimed to uncover the “shared” emotes between streamers/comunties. For example, we can see that the streamer EsfandTv shares alot of emotes with different streamers and may share many community members with TrainwrecksTv.
word_cors <- tokens %>%
group_by(word) %>%
filter(n() >= 10 ) %>%
pairwise_cor(word, streamer, sort = T)#, sort = TRUE)
top_10 <-word_cors %>% mutate("streamer" = case_when(
item2 == "trainwreckstv" ~ "trainwreckstv", # ahh,
item2 == "esfandtv" ~ "esfandtv",
item2 == "forsen" ~ "forsen",
item2 == "mizkif" ~ "mizkif",
item2 == "ludwig" ~ "ludwig",
item2 == "moonmoon" ~ "moonmoon",
item2 == "xqcow" ~ "xqcow",
item2 == "sykkuno" ~ "sykkuno",
item2 == "vadikus007" ~ "vadikus007",
item2 == "loltyler1" ~ "loltyler1",
TRUE ~ "WHO?"
)) # there are 83 unique streamers in the dataset, we should filter this some how. Either top 20, or maybe with chats > 100.
# Build a scraper that grabes the names of emotes for eache of the streamers?
#top_10 %>% group_by(streamer) %>% count() # strange numbers here, each streamer has same number?, because I filteer for top 10 above?
# from 2 mil rows
# to about 2k rows
test<-top_10 %>%
mutate(contains_emote = case_when(item1 %in% emote_data$emote_name ~ 1, TRUE ~ 0)) %>%
filter(contains_emote == 1) %>% # filtering for only emotes!
filter(streamer != "WHO?")%>%
group_by(streamer) %>% top_n(10,wt = correlation)
streamers = c("trainwreckstv","esfandtv","forsen","mizkif","ludwig","moonmoon","xqcow","sykkuno","vadikus007","loltyler1")
emote_data_1 <- emote_data %>% select(emote_name,emote_link) %>% na.omit()
# TEST 2
test_2 <- test %>% left_join(emote_data_1, by = c("item1" = "emote_name"))
#-------
test <- test %>% graph_from_data_frame()
test_viz <- toVisNetworkData(test)
test_viz$nodes <- test_viz$nodes %>% mutate("group" = case_when(label %in% streamers ~ "Streamer",TRUE ~ "Emote"))
#test_viz checking the dataframe
visNetwork(nodes = test_viz$nodes, edges = test_viz$edges, main = "Emote correlation to Streamer")%>%
visGroups(groupname = "Streamer", color = "green", shape = "square") %>%
visGroups(groupname = "Emote", color = "blue")%>%
visOptions(highlightNearest = list(enabled = T, hover = T))%>%
visLegend()
#--------------------
test_2 <- test_2 %>% graph_from_data_frame()
test_viz_2 <- toVisNetworkData(test_2)
test_viz_2$nodes <- test_viz_2$nodes %>% mutate("group" = case_when(label %in% streamers ~ "Streamer",TRUE ~ "Emote"),
"shape" = "image")
test_viz_2$nodes<-test_viz_2$nodes %>% left_join(test_viz_2$edges, by = c('id' = 'from')) %>% select(id,label,group,shape,emote_link) %>% rename(image = emote_link) %>% distinct(id, .keep_all = T)
test_viz_2$nodes<-test_viz_2$nodes %>% mutate(image = case_when(id == "xqcow" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/xqcow-profile_image-9298dca608632101-70x70.jpeg",
id == "loltyler1" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/f3591dbe4ee3d94b-profile_image-70x70.png",
id == "moonmoon" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/3973e918fe7cc8c8-profile_image-70x70.png",
id == "trainwreckstv" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/1f47965f-7961-4b64-ad6f-71808d7d7fe9-profile_image-70x70.png",
id == "forsen" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/forsen-profile_image-48b43e1e4f54b5c8-300x300.png",
id == "esfandtv" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/476ee93d-66a6-4e57-b3a9-db1ceb168ad8-profile_image-70x70.png",
id == "mizkif" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/ddd88d33-6c4f-424f-9246-5f4978c93148-profile_image-70x70.png",
id == "ludwig" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/bde8aaf5-35d4-4503-9797-842401da900f-profile_image-70x70.png",
id == "sykkuno" ~ "https://static-cdn.jtvnw.net/jtv_user_pictures/sykkuno-profile_image-6ab1e70e07e29e9b-70x70.jpeg",
TRUE ~ as.character(image))) %>% na.omit()
#test_viz_2 checking the dataframe
visNetwork(nodes = test_viz_2$nodes, edges = test_viz_2$edges, main = "Emote correlation to Streamer")%>%
visGroups(groupname = "Streamer", color = "green", shape = "square") %>%
visGroups(groupname = "Emote", color = "blue")%>%
visOptions(highlightNearest = list(enabled = T, hover = T))%>%
visLegend()
#test_viz_2$nodes
# delete me at some point
test_viz$edges
test_viz$nodes
test_viz$nodes <- test_viz$nodes %>%
mutate("group" = case_when(label %in% streamers ~ "Streamer",TRUE ~ "Emote"),
"shape" = "image") %>%
inner_join(test_viz$edges, by = c("id" = "from"))# %>%
select(c(id,label,group,shape,emote_link))%>%
mutate("image" = magick::image_read(emote_link))
#test_viz checking the dataframe
visNetwork(nodes = test_viz$nodes, edges = test_viz$edges, main = "Emote correlation to Streamer")%>%
visGroups(groupname = "Streamer", color = "green", shape = "square") %>%
visGroups(groupname = "Emote", color = "blue")%>%
visOptions(highlightNearest = list(enabled = T, hover = T))%>%
visLegend()